32 research outputs found

    Distinct composition and amplification dynamics of transposable elements in sacred lotus (Nelumbo nucifera Gaertn.)

    Get PDF
    Sacred lotus (Nelumbo nucifera Gaertn.) is a basal eudicot plant with a unique lifestyle, physiological features, and evolutionary characteristics. Here we report the unique profile of transposable elements (TEs) in the genome, using a manually curated repeat library. TEs account for 59% of the genome, and hAT (Ac/Ds) elements alone represent 8%, more than in any other known plant genome. About 18% of the lotus genome is comprised of Copia LTR retrotransposons, and over 25% of them are associated with non-canonical termini (non-TGCA). Such high abundance of non-canonical LTR retrotransposons has not been reported for any other organism. TEs are very abundant in genic regions, with retrotransposons enriched in introns and DNA transposons primarily in flanking regions of genes. The recent insertion of TEs in introns has led to significant intron size expansion, with a total of 200 Mb in the 28 455 genes. This is accompanied by declining TE activity in intergenic regions, suggesting distinct control efficacy of TE amplification in different genomic compartments. Despite the prevalence of TEs in genic regions, some genes are associated with fewer TEs, such as those involved in fruit ripening and stress responses. Other genes are enriched with TEs, and genes in epigenetic pathways are the most associated with TEs in introns, indicating a dynamic interaction between TEs and the host surveillance machinery. The dramatic differential abundance of TEs with genes involved in different biological processes as well as the variation of target preference of different TEs suggests the composition and activity of TEs influence the path of evolution

    Early selection of \u3cem\u3ebZIP73\u3c/em\u3e facilitated adaptation of \u3cem\u3ejaponica\u3c/em\u3e rice to cold climates

    Get PDF
    Cold stress is a major factor limiting production and geographic distribution of rice (Oryza sativa). Although the growth range of japonica subspecies has expanded northward compared to modern wild rice (O. rufipogon), the molecular basis of the adaptation remains unclear. Here we report bZIP73, a bZIP transcription factor-coding gene with only one functional polymorphism (+511 G\u3eA) between the two subspecies japonica and indica, may have facilitated japonica adaptation to cold climates. We show the japonica version of bZIP73 (bZIP73Jap) interacts with bZIP71 and modulates ABA levels and ROS homeostasis. Evolutionary and population genetic analyses suggest bZIP73 has undergone balancing selection; the bZIP73Jap allele has firstly selected from standing variations in wild rice and likely facilitated cold climate adaptation during initial japonica domestication, while the indica allele bZIP73Ind was subsequently selected for reasons that remain unclear. Our findings reveal early selection of bZIP73Jap may have facilitated climate adaptation of primitive rice germplasms

    Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing

    Get PDF
    Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype that we developed to accelerate functional genomics and genome editing in tomato. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species

    Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity

    Get PDF
    Background: Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Findings: Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of similar to 7.9 million base pairs (Mb), representing a similar to 300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained similar to 24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Conclusions: Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.Peer reviewe

    Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (\u3ci\u3eFragaria vesca\u3c/i\u3e) with chromosome-scale contiguity

    Get PDF
    Background: Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Findings: Here we utilized a robust, cost-effective approach to produce high-quality reference genomes.We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ~7.9 million base pairs (Mb), representing a ~300-fold improvement of the previous version. The vast majority (\u3e99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ~24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Conclusions: Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions

    Table_1_Categorizing numeric nutrients criteria and implications for water quality assessment in the Pearl River Estuary, China.docx

    No full text
    Coastal eutrophication, the over-enrichment of water with nutrients, has become a global ecological problem. As coastal waters are subjected to great pressure due to anthropogenic influences and climate change, establishing numeric nutrient criteria for coastal waters has been exceedingly complex at present. To control and improve the water quality of the Pearl River Estuary (PRE), based on the data from 2015 to 2020, the nutrient criteria of the PRE and adjacent waters were established using frequency statistical analysis. Based on the spatiotemporal salinity patterns, the coastal waters of the PRE were divided in three subareas namely freshwater (Zone I), mixed (Zone II), and seawater (Zone III) using cluster analysis. The recommended criteria values of dissolved inorganic nitrogen (DIN) were 0.573, 0.312, and 0.134 mg·L-1 in Zones I, II, and III, respectively. The total nitrogen (TN) criterion for Zone III (0.222 mg·L-1) was much lower than those for Zone I (0.902 mg·L-1) and Zone II (0.885 mg·L-1).The dissolved inorganic phosphorus (DIP) criteria were different for the three Zones, ranging from 0.004 to 0.009 mg·L-1, and the total phosphorus (TP) recommended criteria in Zones I, II, and III were 0.039, 0.028, and 0.020 mg·L-1, respectively. In the water quality assessment, the categorizing numeric nutrients criteria can be referred and applied into fresh, mixed, and seawater zones of PRE. The results of this study provide a new nutrient reference condition in the PRE, which could be helpful in establishing integrated land-ocean unified nutrient criteria and water quality assessment, and implementing effective coastal eutrophication control in the future.</p

    Replaying the evolutionary tape to investigate subgenome dominance in allopolyploid Brassica napus

    No full text
    Interspecific hybridization and allopolyploidization merge evolutionarily distinct parental genomes (subgenomes) into a single nucleus. A frequent observation is that one subgenome is "dominant" over the other subgenome, having a greater number of retained genes and being more highly expressed. Which subgenome becomes dominantly expressed in allopolyploids remains poorly understood. Here we "replayed the evolutionary tape" with six isogenic resynthesized Brassica napus (rapeseed) allopolyploid lines and investigated subgenome dominance patterns over the first ten generations post merger. We found that the same subgenome was consistently more dominantly expressed in all lines and generations and that >70% of biased gene pairs showed the same dominance patterns across all lines and an in silico hybrid of the parents. Gene network analyses indicated an enrichment for network interactions and several biological functions for the Brassica oleracea derived 'BnC' subgenome biased pairs, but no enrichment was identified for Brassica rapa derived 'BnA' subgenome biased pairs. Furthermore, DNA methylation differences between subgenomes mirrored the observed gene expression bias towards the 'BnC' subgenome in all lines and generations. These methylation patterns were consistent with those previously associated with higher expression, but differ from proposed mechanisms from recent conceptual models and with observations in other polyploid systems that exhibit subgenome dominance. Many of these differences in gene expression and methylation were also found when comparing the progenitor genomes, suggesting subgenome dominance is partly related to parental genome differences rather than just a byproduct of allopolyploidization. These findings demonstrate that "replaying the evolutionary tape" in an allopolyploid results in largely repeatable and predictable subgenome expression dominance patterns, partly due to preexisting genetic differences among the parental species.All intermediate files are present to reproduce figures from the manuscript. BNapusRNASeqPlots.R reads necessary files, processes them, and produces figures. All file paths are standardized with here() package in R. BisfulfiteSeq may be aided by the github repo https://github.com/niederhuth/Replaying-the-evolutionary-tape-to-investigate-subgenome-dominance Funding provided by: National Natural Science Foundation of ChinaCrossref Funder Registry ID: http://dx.doi.org/10.13039/501100001809Award Number: 31471173Funding provided by: National Natural Science Foundation of ChinaCrossref Funder Registry ID: http://dx.doi.org/10.13039/501100001809Award Number: 31871239Funding provided by: Directorate for Biological SciencesCrossref Funder Registry ID: http://dx.doi.org/10.13039/100000076Award Number: 2029959Funding provided by: National Agricultural Statistics ServiceCrossref Funder Registry ID: http://dx.doi.org/10.13039/100009174Funding provided by: National Science FoundationCrossref Funder Registry ID: http://dx.doi.org/10.13039/100000001Award Number: 1424871Homoeologous exchange analysis Paired end 150bp genomic illumina reads were filtered with Trimmomatic v 0.33 (Bolger et al. 2014) to remove illumina TruSeq3 adapters. Trimmed reads were aligned to the in silico B.napus reference genome with Bowtie2 v.2.3.4.1(Langmead and Salzberg 2012) on default settings with the flag "--very-sensitive-local". Bam files sorted with bamtools (Barnett et al. 2011) for use in downstream analyses. MCScan toolkit (Tang et al. 2008) was used to identify syntenic, homologous gene pairs (syntelogs) between Brassica rapa (reference genome R500) and Brassica oleracea (reference genome TO1000; Parkin et al. 2014). In the synthetic polyploid these can be thought of as syntenic homoeologs. Bed files based on chromosome and start/stop position information for each subgenome were generated. For all 18 samples (6 individuals x 3 generations) read depth for the A subgenome (BnA) syntenic homoeologs was determined in Bedtools (Quinlan and Hall 2010) with BedCov using the R500 syntelog bed file and for the C subgenome (BnC) using the TO1000 syntelog bed file. In R v 3.4.1, read depths for each syntenic homoeolog was normalized to reads per million for subgenome of origin and the ratio of reads mapping to a syntenic homoeolog compared to the overall read mapping for a syntenic homoeolog pair was averaged over a window of 50 genes with a step of one gene. Homoeologous exchanged regions were identified by calculating average read depth for the BnC subgenome along a sliding window of 170 (85 up- and down stream) genes and step size of one. If 10 or more consecutive genes had a read depth within a pre-selected range it was called a homoelogous exchange. Regions 0 ≀ read depth < 0.2 were predicted to be in a 0BnC-to-4BnA ratio, 1BnC-to-3BnA was predicted for 0.2 ≀ read depth < 0.4, 2BnC-to-2BnA was predicted for 0.4 ≀ read depth < 0.6, 3BnC-to-1BnA for read depth between 0.6 ≀ read depth <0.8 and 4BnC-to-0BnA for read depth between 0.8 ≀ read depth < 1. RNASeq analysis Raw RNA-seq reads were filtered with Trimmomatic v 0.33 (Bolger et al. 2014) to remove illumina TruSeq3 adapters and mapped to the in silico reference using STAR v 2.6.0 (Dobin et al. 2013) on default settings. Transcripts were quantified in transcripts per million (TPM) from RNAseq alignments using StringTie v 1.3.5 (Pertea et al. 2015). Because the syntelogs in the progenitor genomes are in the subgenomes of the synthetic polyploids, they can be thought of as syntenic homoeologs. To avoid dosage imbalance, only syntenic homoeologs determined to be at a 2:2 dosage balance were analyzed for homoeolog expression bias. Additionally, to remove lowly expressed genes that might be noise, syntenic homoeologs were only kept if the total TPM of the pair was greater than 10. Syntenic homoeolog pairs with Log2 Foldchange greater than 3.5 were called BnC biased, and less than 3.5 were called BnA biased. This cutoff follows the practice of Woodhouse et al. (2014) which used a log FC cutoff of 2 to determine homoeolog expression bias, however to more confidently reduce false positives a higher FC cutoff of 3.5 was used. Because lack of subgenome dominance would follow a normal distribution where deviations from 0 FC is equal in either direction, a Chi-squared goodness of fit test was carried out to test for normality. The R package Upsetr was used to identify and plot syntenic homoeologs shared by all lines for a given generation. For each generation, Arabidopsis thaliana orthologs were identified for genes showing the same subgenome bias in all six lines and the progenitors and were investigated for GO and KEGG pathway enrichment (Ashburner et al. 2000; Kanehisa and Goto 2000) in the STRING PPI network (Szklarczyk et al. 2017) using the online STRING network search application. STRING also calculated and reported average node degree, clustering coefficients, and enrichment for network interactions. DNA methylation analysis Whole genome bisulfite sequencing (WGBS) data was mapped to the combined in silico reference genome using methylpy v1.3.8 (Schultz et al. 2015) (see Supplementary Table X); using cutadapt v2.3 (Martin 2011) for adaptor trimming, Bowtie2 v2.3.5 (Langmead and Salzberg 2012) for alignment, and Picard tools v2.20.2 for marking duplicates. The chloroplast genome is unmethylated in plants and was used as an internal control for calculating the non-conversion rate of bisulfite treatment, the percentage of unmethylated sites that fail to be converted to uracil (Lister et al. 2008). Methylpy accounts for this non-conversion in calling methylated sites. When the parental WGBS data was mapped to the combined genome (TO100 + R500), a small fraction of reads of each sample mapped to the other sub-genome, ~1.3% TO1000 to B. rapa and ~6.1% IMB218 to B. oleracea. We compared results from mapping of the parental data to either the combined genome or their own respective genome. There was little difference in DNA methylation levels or patterns for either parent and therefore concluded that the impact of this mismapping was insignificant. As a further control, we created a 'mock' allopolyploid in silico. to serve as The TO1000 data was randomly downsampled to an equal number of read pairs as IMB218. The two datasets were combined and mapped to the combined genome to mimic an in silico allopolyploid. DNA methylation levels in this 'mock' allopolyploid were either approximately half-way between the two parents for the whole genome or nearly identical to their respective parent at a sub-genome level. If DNA methylation in the resynthesized lines is simply a combination of both parent's methylomes, then we expect global DNA methylation in the resynthesized lines to be similar to this combined mock dataset. Deviation from this pattern would indicate global remodeling of DNA methylation. Genome-wide levels of DNA methylation and DNA methylation metaplots were analyzed as previously described (Niederhuth et al. 2016) using python v3.7.3, Pybedtools v0.8 (Dale et al. 2011), and Bedtools v2.25.0 (Quinlan and Hall 2010). Genome-wide DNA methylation levels were calculated for each sequence context (CG, CHG, and CHH) using the weighted methylation level (Schultz et al. 2012), which accounts for sequencing coverage. For gene metaplots, cytosines from 2 kilobase (kb) upstream, 2 kb downstream and within the gene/TE body were extracted. For gene bodies, only cytosines in coding sequences were used, as the presence of TEs in introns and problems of proper UTR annotation can obscure DNA methylation at start/stop sites and introduce misleadingly high levels of DNA methylation (Niederhuth et al. 2016). Each of these three regions (upstream, gene body, and downstream) were then divided into 20 windows and the weighted methylation level for each window calculated and averaged for all genes. For LTR metaplots, the same analysis was performed, except all cytosines within the LTR body were included. Plot were made in R v3.6.0 (R Core Team, 2013) using ggplot2 (Wickham 2009). All code and original analyzed data and plots are available on Github (https://github.com/niederhuth/Replaying-the-evolutionary-tape-to-investigate-subgenome-dominance)
    corecore